NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Efficiently Constructing Sparse Navigable Graphs

Conway, Alex; Dhulipala, Laxman; Farach-Colton, Martin; Johnson, Rob; Landrum, Ben; Musco, Christopher; Shechter, Yarin; Suel, Torsten; Wen, Richard (January 2026, ACM-SIAM Symposium on Discrete Algorithms (SODA).)

Full Text Available
Paging and the Address-Translation Problem

https://doi.org/10.1145/3737700

Bender, Michael A; Bhattacharjee, Abhishek; Conway, Alex; Farach-Colton, Martín; Johnson, Rob; Kannan, Sudarsun; Kuszmaul, William; Mukherjee, Nirjhar; Porter, Don; Tagliavini, Guido; et al (October 2025, ACM Transactions on Algorithms)

The classical paging problem, introduced by Sleator and Tarjan in 1985, formalizes the problem of caching pages in RAM in order to minimize IOs. Their online formulation ignores the cost of address translation: Programs refer to data via virtual addresses, and these must be translated into physical locations in RAM. Although the cost of an individual address translation is much smaller than that of an IO, every memory access involves an address translation, whereas IOs can be infrequent. In practice, one can spend money to avoid paging by over-provisioning RAM; in contrast, address translation is effectively unavoidable. Thus address-translation costs can sometimes dominate paging costs, and systems must simultaneously optimize both. To mitigate the cost of address translation, all modern CPUs have translation lookaside buffers (TLBs), which are hardware caches of common address translations. What makes TLBs interesting is that a single TLB entry can potentially encode the address translation formanyaddresses. This is typically achieved via the use of huge pages, which translate runs of contiguous virtual addresses to runs of contiguous physical addresses. Huge pages reduce TLB misses at the cost of increasing the IOs needed to maintain contiguity in RAM. This tradeoff between TLB misses and IOs suggests that the classical paging problem does not tell the full story. This article introduces the Address-Translation Problem, which formalizes the problem of maintaining a TLB, a page table, and RAM in order to minimize the total cost of both TLB misses and IOs. We present an algorithm that achieves the benefits of huge pages for TLB misses without the downsides of huge pages for IOs.
more » « less
Full Text Available
Tiny Pointers

https://doi.org/10.1145/3700594

Bender, Michael A; Conway, Alex; Farach-Colton, Martín; Kuszmaul, William; Tagliavini, Guido (October 2024, ACM Transactions on Algorithms)

This paper introduces a new data-structural object that we call the tiny pointer. In many applications, traditional\(\log n\)-bit pointers can be replaced with\(o(\log n)\)-bit tiny pointers at the cost of only a constant-factor time overhead and a small probability of failure. We develop a comprehensive theory of tiny pointers, and give optimal constructions for both fixed-size tiny pointers (i.e., settings in which all of the tiny pointers must be the same size) and variable-size tiny pointers (i.e., settings in which the average tiny-pointer size must be small, but some tiny pointers can be larger). If a tiny pointer references an item in an array filled to load factor\(1-\delta\), then the optimal tiny-pointer size is\(\Theta(\log\log\log n+\log\delta^{-1})\)bits in the fixed-size case, and\(\Theta(\log\delta^{-1})\)expected bits in the variable-size case. Our tiny-pointer constructions also require us to revisit several classic problems having to do with balls and bins; these results may be of independent interest. Using tiny pointers, we apply tiny pointers to five classic data-structure problems. We show that:A data structure storing\(n\)\(v\)-bit values for\(n\)keys with constant-factor time modifications/queries can be implemented to take space\(nv+O(n\log^{(r)}n)\)bits, for any constant\(r\gt0\), as long as the user stores a tiny pointer of expected size\(O(1)\)with each key—here,\(\log^{(r)}n\)is the\(r\)-th iterated logarithm.Any binary search tree can be made succinct, meaning that it achieves\((1+o(1))\)times the optimal space, with constant-factor time overhead, and can even be made to be within\(O(n)\)bits of optimal if we allow for\(O(\log^{*}n)\)-time modifications—this holds even for rotation-based trees such as the splay tree and the red-black tree.Any fixed-capacity key-value dictionary can be made stable (i.e., items do not move once inserted) with constant-factor time overhead and\((1+o(1))\)-factor space overhead.Any key-value dictionary that requires uniform-size values can be made to support arbitrary-size values with constant-factor time overhead and with an additional space consumption of\(\log^{(r)}n+O(\log j)\)bits per\(j\)-bit value for an arbitrary constant\(r\gt0\)of our choice.Given an external-memory array\(A\)of size\((1+\varepsilon)n\)containing a dynamic set of up to\(n\)key-value pairs, it is possible to maintain an internal-memory stash of size\(O(n\log\varepsilon^{-1})\)bits so that the location of any key-value pair in\(A\)can be computed in constant time (and with no IOs). In each case tiny pointers allow for us to take a natural space-inefficient solution that uses pointers and make it space-efficient for free.
more » « less
Full Text Available
Online List Labeling: Breaking the \({\log^2 n}\) Barrier

https://doi.org/10.1137/22M1534468

Bender, Michael A.; Conway, Alex; Farach-Colton, Martín; Komlós, Hanna; Kuszmaul, William; Wein, Nicole (October 2024, SIAM Journal on Computing)

Full Text Available
Nearly Optimal List Labeling

https://doi.org/10.1109/FOCS61266.2024.00132

Bender, Michael A; Conway, Alex; Farach-Colton, Martín; Komlós, Hanna; Koucký, Michal; Kuszmaul, William; Saks, Michael (October 2024, IEEE)

Full Text Available
Don't Melt Your Cache: Low-Associativity with Heat-Sink

https://doi.org/10.1145/3694906.3743303

Bender, Michael A; Conway, Alex; DeLayo, Daniel; Farach-Colton, Martin; Han, Jaehyun; He, Linfeng; Johnson, Rob; Kannan, Sudarsun; Kuszmaul, William; Porter, Donald; et al (July 2025, ACM)

Full Text Available
Adaptive Quotient Filters

https://doi.org/10.1145/3677128

Wen, Richard; McCoy, Hunter; Tench, David; Tagliavini, Guido; Bender, Michael A; Conway, Alex; Farach-Colton, Martin; Johnson, Rob; Pandey, Prashant (October 2024, Proceedings of the ACM on Management of Data)

Filters trade off accuracy for space and occasionally return false positive matches with a bounded error. Numerous systems use filters in fast memory to avoid performing expensive I/Os to slow storage. A fundamental limitation in traditional filters is that they do not change their representation upon seeing a false positive match. Therefore, the maximum false positive rate is only guaranteed for a single query, not for an arbitrary set of queries. We can improve the filter's performance on a stream of queries, especially on a skewed distribution, if we can adapt after encountering false positives. Adaptive filters, such as telescoping quotient filters and adaptive cuckoo filters, update their representation upon detecting a false positive to avoid repeating the same error in the future. Adaptive filters require an auxiliary structure, typically much larger than the main filter and often residing on slow storage, to facilitate adaptation. However, existing adaptive filters are not practical and have not been adopted in real-world systems for two main reasons. First, they offer weak adaptivity guarantees, meaning that fixing a new false positive can cause a previously fixed false positive to come back. Secondly, the sub-optimal design of the auxiliary structure results in adaptivity overheads so substantial that they can actually diminish overall system performance compared to a traditional filter. In this paper, we design and implement the \sysname, the first practical adaptive filter with minimal adaptivity overhead and strong adaptivity guarantees, which means that the performance and false-positive guarantees continue to hold even for adversarial workloads. The \sysname is based on the state-of-the-art quotient filter design and preserves all the critical features of the quotient filter such as cache efficiency and mergeability. Furthermore, we employ a new auxiliary structure design which results in considerably low adaptivity overhead and makes the \sysname practical in real systems. We evaluate the \sysname by using it to filter queries to an on-disk B-tree database and find no negative impact on insert or query performance compared to traditional filters. Against adversarial workloads, the \sysname preserves system performance, whereas traditional filters incur 2× slowdown from adversaries representing as low as 1% of the workload. Finally, we show that on skewed query workloads, the \sysname can reduce the false-positive rate 100× using negligible (1/1000th of a bit per item) space overhead.
more » « less
Full Text Available
Layered List Labeling

https://doi.org/10.1145/3651602

Bender, Michael A.; Conway, Alex; Farach-Colton, Martin; Komlos, Hanna; Kuszmaul, William (June 2024, PODS)

Full Text Available
Layered List Labeling

https://doi.org/10.1145/3651602

Bender, Michael A; Conway, Alex; Farach-Colton, Martin; Komlós, Hanna; Kuszmaul, William (May 2024, Proceedings of the ACM on Management of Data/47th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS))

The list-labeling problem is one of the most basic and well-studied algorithmic primitives in data structures, with an extensive literature spanning upper bounds, lower bounds, and data management applications. The classical algorithm for this problem, dating back to 1981, has amortized cost O(log bn). Subsequent work has led to improvements in three directions: low-latency (worst-case) bounds; high-throughput (expected) bounds; and (adaptive) bounds for important workloads. Perhaps surprisingly, these three directions of research have remained almost entirely disjoint---this is because, so far, the techniques that allow for progress in one direction have forced worsening bounds in the others. Thus there would appear to be a tension between worst-case, adaptive, and expected bounds. List labeling has been proposed for use in databases at least as early as PODS'99, but a database needs good throughput, response time, and needs to adapt to common workloads (e.g., bulk loads), and no current list-labeling algorithm achieve good bounds for all three. We show that this tension is not fundamental. In fact, with the help of new data-structural techniques, one can actually combine any three list-labeling solutions in order to cherry-pick the best worst-case, adaptive, and expected bounds from each of them.
more » « less
Full Text Available
Iceberg Hashing: Optimizing Many Hash-Table Criteria at Once

https://doi.org/10.1145/3625817

Bender, Michael A.; Conway, Alex; Farach-Colton, Martín; Kuszmaul, William; Tagliavini, Guido (December 2023, Journal of the ACM)

Despite being one of the oldest data structures in computer science, hash tables continue to be the focus of a great deal of both theoretical and empirical research. A central reason for this is that many of the fundamental properties that one desires from a hash table are difficult to achieve simultaneously; thus many variants offering different trade-offs have been proposed. This article introduces Iceberg hashing, a hash table that simultaneously offers the strongest known guarantees on a large number of core properties. Iceberg hashing supports constant-time operations while improving on the state of the art for space efficiency, cache efficiency, and low failure probability. Iceberg hashing is also the first hash table to support a load factor of up to1 - o(1)while being stable, meaning that the position where an element is stored only ever changes when resizes occur. In fact, in the setting where keys are Θ (logn) bits, the space guarantees that Iceberg hashing offers, namely that it uses at most\(\log \binom{|U|}{n} + O(n \log \ \text{log} n)\)bits to storenitems from a universeU, matches a lower bound by Demaine et al. that applies to any stable hash table. Iceberg hashing introduces new general-purpose techniques for some of the most basic aspects of hash-table design. Notably, our indirection-free technique for dynamic resizing, which we call waterfall addressing, and our techniques for achieving stability and very-high probability guarantees, can be applied to any hash table that makes use of the front-yard/backyard paradigm for hash table design.
more » « less
Full Text Available

« Prev Next »

Search for: All records